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Turbo Coding and MAP decoding - Part 1 


Baye’s Theorem of conditional probabilities 

Let’s first state the theorem of conditional probability, also called the Baye’s theorem. 
P(A and B) = P(A)P(B givenA) 

Which we can write in more formal terminology as 

P(A, B) = P(A)P(B|A) A 


where P(B|A) is referred to as the probability of event B given that A has already occurred. 
If event A always occurs with event B, then we can write the following expression for the 
absolute probability of event A. 


P(A) = >) P(A,B) B 
B 
If events A and B are independent from each other then (A) degenerates to 
P(A, B)= P(A) 
P(A,B)=P(A)P(B) C 


The relationship C is very important and we will use it heavily in the explanation of Turbo 
decoding in this chapter. 
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If there are three independent events A, B and C, then the Baye’s rule becomes 
P(A,B|C) = P(A|C) P(B|A,C) D 

A-priori and a-posteriori probabilities 

Here is Bayes' theorem again. 


P(A,B) = P(AB)  P(B) 


{{_ — 

Probability of a—posteriori —4@—prtort 

both Aand B probability of probability of 
event A event 


E 
= P(B|A) P(A) 
ERT 
n «i  —priori 
bis obability of probability of 


event B event A 
The probability of event A conditioned on event B, is given by the probability of A given B 
times the probability of event a. 


The probability of A, or P(A) is the base probability of even A and is called the a-priori 
probability. The term P(A,B) the conditional probability is called the a-posteriori 
probability or APP. One is independent probability, the other depends on some event 
occurring. We will be using the acronym APP a lot, so make sure you remember that is the 
same a-posteriori probability. In other words, the APP of an event is a function of an 
another event also occurring at the same time. We can write (E) as 


P(B| A)P(A) 


P(A, B) = P(AB) = eh 


APP 


This says that we can determine the APP of an event by taking the conditional probability of 
that event divided by it’s a-priori probability. 


What these mean is best explained by the following two quotes. 


In epistemological terms “ A priori” and “a posteriori” refer primarily to how, or on what 
basis, a proposition might be known. In general terms, a proposition is knowable a priori if 
it is knowable independently of experience, while a proposition knowable a posteriori is 
knowable on the basis of experience. The distinction between a priori and a posteriori 
knowledge thus broadly corresponds to the distinction between empirical and non- 
empirical knowledge.” [2] 


“But how do we decide when we have gathered enough data to justify modifying our 
prediction of the probabilities? That is one of the essential problems of decision theory. 
How do we make the transition from a priori statistics to a posteriori probability?” [3] 
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The MAP algorithm helps us make the transition from a-priori knowledge to knowledge 
based on received data. 


Structure of a Turbo Code 


According to Shannon, the ultimate code would be one where a message is sent infinite 
times, each time shuffled randomly. The receiver has an infinite versions of the message 
albeit corrupted randomly. From these copies, the decoder would be able to decode with 
near error-free probability the message sent. This is the theory of an ultimate code, the one 
that can correct all errors for a virtually signal. 


Turbo code is a step in that direction. But it turns out that for an acceptable performance we 
do not really need to send the information infinite number of times, just two or three times 
provides pretty decent results for our earthly channels. 


In Turbo codes, particularly the parallel structure, Recursive systematic convolutional (RSC) 
codes working in parallel are used to create the “random” versions of the message. 


The parallel structure uses two or more RSC codes, each with a different interleaver. The 
purpose of the interleaver is to offer each encoder an uncorrelated or a “random” version of 
the information, resulting in parity bits from each RSC that are independent. How 
“independent” these parity bits are, is essentially a function of the type and length/depth of 
the interleaver. The design of interleaver in itself is a science. In a typical Viterbi code, the 
messages are decoded in blocks of only about 200 bits or so, where as in Turbo coding the 
blocks are on the order of 16K bits long. The reason for this length is to effectively 
randomize the sequence going to the second encoder. The longer the block length, the 
better is its correlation with the message from the first encoder, i.e. the correlation is low. 


On the receiving side, there are same number of decoders as on the encoder side, each 
working on the same information and an independent set of parity bit. This type of 
structure is called Parallel Concatenated Convolutional Code or PCCC. 


The convolutional codes used in turbo codes usually have small constraint length. Where a 
longer constraint length is an advantage in stand-alone convolutional codes, it does not lead 
to better performance in TC and increases computation complexity and delay. The codes in 
PCCC must be RSC. The RSC property allows the use of systematic bit as a standard to which 
the independent parity bits from the different coders are used to assess its reliability. The 
decoding most often applied is an iterative form of decoding. 


When we have two such codes, the signal produced is rate 1/3. If there are three encoders, 
then the rate is % and so on. Usually two encoders are enough as increasing the number of 
encoders reduces bandwidth efficiency and does not buy proportionate increase in 
performance. 
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Figure 1 - A rate 1/(n+1) Parallel Concatenated Convolutional Code (PCC) Turbo Code 


Turbo codes also come as Serial Concatenated Convolutional Code or SCCC. The SCCC codes 
appear to have better performance at higher SNRs. Where the PCCC codes require both 
constituent codes to be RSC, in SCCC, only the inner code must be RSC. PCCC codes also 
seem to have a flattening of performance around 10-6 which is less evident in SCCC. The 
SCCC constituent code rates can also be different as shown below. The outer code can even 
be a block code. 


In general the PCCC is a special form of SCCC. We can even think of concatenation of 
RS/Convolutional codes, used in line-of-sight links as a form of SCCC. A Turbo SCCC may 
look like the figure below with different rate constituent codes. 


EC1 ee | EC2 
Rate 1/2 Rate 2/3 


Figure 2 - Serially concatenated constituent coding (SCCC) 


Then there are also hybrid versions that use both PCCC and SCCC such as shown in figure 
below. 


Figure 3 - Hybrid Turbo Codes 


There is an another form called Turbo Product Code or TPC. This form has a very different 
structure from the PCCC or SCCC. TPC use block codes instead of convolutional codes. Two 
different block codes (usually Hamming codes) are concatenated serially without an 
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interleaver in between. Since the two codes are independent and operate in rows and 
columns, this alone offers enough randomization that no interleaver is required. TPC codes, 
like PCCC also perform well in low SNR and can be formed by concatenating any type of 
block codes. Typical coding method is to array the coded data in rows and then the second 
code uses the columns of the new data for its coding. The following shows a TPC code 
created from a (7x5) and a (8x4) Hamming code. The 8x4 code first codes the 4 info bits into 
8, by adding 4 p1 party bits. These are arrayed in five rows. Then the 7x5 code works on 
these in columns and creates (in this case, both codes are systematic) new parity bits p2 for 
each column. The net code rate is 5/14 ~ 0.33. The decoding is done along rows and then 
columns. 


Hamming |_(x8) (8x5}'| Hamming | (7x8) 
(8x4) (7x5) 


Figure 4 - Turbo Product codes 


What makes all these codes “Turbo” is not their structure but a form of feedback iterative 
decoding. If the structure of a SCCC does not use the iterative coding then it would be just a 
plain old concatenated code, not a turbo code. 


Maximum a-posteriori Probability (MAP) decoding algorithm 


Turbo codes are decoded using a method called the Maximum Likelihood Detection or MLD. 
Filtered signal is fed to the decoders, and the decoders work on the signal amplitude to 
output a soft “decision” The a priori probabilities of the input symbols is used, and a soft 
output indicating the reliability of the decision (amounting to a suggestion by decoder 1 to 
decoder 2) is calculated which is then iterated between the two decoders. 


The form of MLD decoding used by turbo codes is called the Maximum a-posteriori 
Probability or MAP. In communications, this algorithm was first identified in BCJR. And that 
is how it is known for Turbo applications. The MAP algorithm is related to many other 
algorithms, such as Hidden Markov Model, HMM which is used in voice recognition, 
genomics and music processing. Other similar algorithms are Baum-Welch algorithm, 
Expectation maximization, Forward-Backward algorithm, and more. MAP is a complex 
algorithm, hard to understand and hard to explain. 
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In addition to MAP algorithm, another algorithm called SOVA , based on Viterbi decoding is 
also used. SOVA uses Viterbi decoding method but with soft outputs instead of hard. SOVA 
maximizes the probability of the sequence, whereas MAP maximizes the bit probabilities at 
each time, even if that makes the sequence not-legal. MAP produces near optimal decoding. 


In turbo codes, the MAP algorithm is used iteratively to improve performance. It is like the 
20 questions game, where each previous guess helps to improve your knowledge of the 
hidden information. The number of iteration is often preset as in 20 questions. More 
iteration are done when the SNR is low, when SNR is high, lesser number of iterations are 
required since the results converge quickly. Doing 20 iteration maybe a waste if signal 
quality is good. Instead of making a decision ad-hoc, the algorithm is often pre-set with 
number of iterations. On the average, seven iterations give adequate results and no more 20 
are ever required. These numbers have relationship to the Central Limit Theorem. 


Figure 5 - Iterative decoding in MAP algorithm 


Although used together, the terms MAP and iterative decoding are separate concepts. MAP 
algorithm refers to specific math. The iterative process on the other hand can be applied to 
any type of coding including block coding which is not trellis based and may not use MAP 
algorithm. 


Iam going to concentrate only on PCCC decoding using iterative MAP algorithm In part 2, 
we will go through a step-by-step example. In this part, we will cover the theory of MAP 
algorithm. 


We are going to describe MAP decoding using a Turbo code in shown Figure 5 with two RSC 
encoders. Each RSC has two memory registers so the trellis has four states with constraint 
length equal to 3. 
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Figure 6 - Arate 1/3 PCCC Turbo code in a 8PSK channel 


The rate 1/3 code shown here has two identical RSC convolutional codes. The coding trellis 
for each is given by the figure 7. The blue lines show transitions in response to a 0 and red 
lines in response to a 1. The notation 1/11, the first is the input bit, the next are code bits. Of 
these, the first is what we called the systematic bit, and as you can see it is the same as the 
input bit. The second bit is the parity bit. Each code uses the same trellis for encoding. The 
labels along the branches can be read as u, /c;, c,. Since this a RSC, the first code bit is same 


as the information bit or w, =c;,. 


The info bits are called ux. The coded bits are referred to by the vector c. Then the coded bits 
are transformed to an analog symbol x and transmitted. On the receive side, a noisy version 
of x is received. By looking at how far the received symbol is from the decision regions, a 
metric of confidence is added to each of the three bits in the symbol. Often Gray coding is 
used, which means that not all bits in the symbol have same level of confidence for decoding 
purposes. There are special algorithms for mapping the symbols (one received voltage 
value, to M soft-decisions, with M being the M in M-PSK.) Let’s assume that after the 
mapping and creating of soft-metrics, the vector y is received. One pair of these decoded 
soft-bits are sent to the first decoder and another set, using a de-interleaved version of the 
systematic bit and the second parity bit are sent to the second decoder. 


Each decoder works only on these bits of information and pass their confidence scores to 
each other until both agree within a certain threshold. Then the process restarts with next 
symbol in a sequence or block consisting of N symbols (or bits.) 


Definitions 


N is the frame size of transmitted symbols. So for a M-PSK, there would be 3N bits 
transmitted per frame, of these 1/3 will be the information bit. In a typical Turbo code, 
there may be as many as 20,000 smbols in a frame. 
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The sequence of information bits is given by u. The first encoder gets ux = (ui, U2, Us, ...UN) 
and the second encoder gets a preset reordering of this same sequence. For example, we 
may pick this mapping function. 


u =| ti ogling Ugg Ug Uns Uses Uy | 


u,= ae thy thy Ug, Ups Uigs User Dy | 


The encoder mapping is be given by the vector c. The c, = ie", ea ) are two bits produced 


by the first encoder, and c, = jc ‘ ad ) are the two bits produced by the second encoder. 


The information bit to code bits mapping is done a trellis such as the one described in Table 
I. 


Table I - Mapping of information bits to code bits 


Figure 7 - the trellis diagram of the rate 1/2 code RSC code 


The symbol described by vector x (3 bits per vector) is sent for each time i. Let’s call this 
vector x;. There would N of these symbols transmitted. 


x,=[1,1,1] x, =[-1,1,-1] 
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y, =O 0", ¥”) are the soft mapped bits. The goal is to take these and make a guess 


about the transmitted x vector and hence code bits which inturn decode u, the information 


bit. Of this three soft bits, each decoder gets just two of these values. The first decoder 


gets y,, =(y;", y,”) and the second decoder gets y,, =(y;", y,’”) which are its respective 


received data. Each decoder works with just two values. The second decoder however gets a 
reordered version of the systematic bit, so it is getting only one bit of independent 
information. 


Log-likelihood Ratio (LLR) 


Let’s take the information bit, a binary variable, ux, where u is its value at time k. Its Log- 
likelihood Ratio (LLR) is defined as the natural log of its base probabilities. 


Pu, = +1) 


Lu,)=1 
ve "Pu, =) 


(1.1) 


If u has two values, +1 and -1 volts representing 0 and 1 bit and since these are equally 
likely, as they are in most communication system, then this ratio is equal to zero. This 
metric is used in most error correction coding and is called the log likelihood ratio or LLR. 
This is a sensitive metric, quite a bit better than a linear metric. Logs make it easy to deal 
with very small and very large numbers, as you well know. 


Now note what happens to this metric, if the the binary variable is not equally likely, as 
happens in trellis decoding. From elementary probability theory, we know that sum of all 
probabilities of an event add to 1. So we can write that the probability of u = +1 as 1 minus 
the probability of u = -1. 


Ptu, =-l) =1- Ptu, = +1) 
Now using equation (1.1), rewrite the expression of LLR from (1.2) as. 


P(u, = +1) 


Ltu,)=1 
oe Ge =+1) 


(1.2) is plotted in Figure 8 as a function of the probability of one of the events. Let’s say we 
are given L(ux) = 1.0. Then the probability that u, = +1 is equal to 0.73 
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Probability, uk = +1 


Figure 8 - The range of LLR is from -© to +0 and is a direct indication of the bit. 


As we can see, if LLR, a non-dimensional metric, is positive, it is a pretty good indicator of 
the sign of the bit. This is an even better indicator than the Euclidean distance since its 
range is so large. LLR a very useful parameter and we will see how it is used in Turbo 
decoding. 


In (1.2) formulate the LLR of a decoded bit ux, at time k, conditioned on the received signal, 
a sequence of N bits. The lower index of y means it starts at time 1 and the upper index 
means the ending point at N. 


P(y, = +1197’) 


L(u,) = In (1.2) 
P(u, =-11 91) 
We can reformulate the numerator and denominator using Baye’s rule C. 
N _ N N _ 
bain Py, uy =t+)/ PO, ) In PQ, .u, =+D (1.3) 


P(y) ue =-D/ POY) = POP um =D 


This formulation of the LLR includes joint probabilities between the received bits and the 
information bit, the numerator of which can be written as (1.4). For RSC trellis, each path is 
uniquely specified by any pair of these: 1. the start state s’, 2. the ending state s, 3. The input 
bit. If we know any two of these pieces of information, we can identify the correct path 
without error. In this equation, we only know the whole sequence a , we do not know ux, 
nor do we have any other of the piece of information. But what we do know is that saying a 


Ux = 1 is same as saying that the correct path is one of the four possible paths shown in Fig. 
8. 


So the joint probability of (y,’,u, =+1) (having received the N bit sequence, the probability 


of u, at time k = 1) is the same as replacing the information bit with ending and starting 
states. These formulations are equivalent. 
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> P(uy, = +1y%) = > P(s, Ss, yi) (1.4) 
N N 


Figure 9 - Trellis transition possible at any time k in response to a +1 and -1. 


We have changed the left hand side of the equation by replacing ux = +1 with two states s’ 
and s. The sg’ is the starting state and s is the ending state for all allowable transitions for 
which the uk = +1. There are four valid transitions related to decoding a +1. Assume that 
there is a probability associated with each of these four transitions. This says that the 
probability of making a decoding decision in favor of a +1 is the sum of the probability of all 
four of these possible paths. 


Now plug Error! Reference source not found. into (1.3) to get the log likelihood ratio of ux 
as 


by P(s, Ss, yi) 


P(y uw, =+1) uaa 
Py’ ,u,=-1) > POs, 8, 0) 


u,=-1 


L(u,) = In 


(1.5) 


Now we need to make one more conceptual leap. The probability of deciding which road, i.e. 
the +1 road or the -1 road taken is a function of where the path started and where it will end 
which are usually given as boundary conditions. If we split the whole received sequence 
into manageable parts, it may help us identify the starting or ending states. We incorporate 
these ideas into (1.5) to get, 


N_ k-1 N 
J =I » Spo Yb 4] 
~ Vy Sh? fF 


We take the N bit sequence and separate it into three pieces, from 1 to k-1, then the kth 
point, and then from k+1 to N. We adopt a slightly easier terminology in (1.6). 


P(s', 8, 4 )= PCS; 8, Yps Yes Ip) (1.6) 
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Vp The past sequence, the part that came before the current data point. 
VE The current data point 

Ye The sequence points that come after the current point. 

Rewrite using (1.6) using Baye’s rule of joint probabilities (D). 


P(s, 8, ¥)=P(S, 8, Vor Veo Ip) 


AL. 

=P(y,18, 8; ¥p>¥_ )P(S, 8, Vp Ye) os 
This looks complicated but we are just applying Bayes rule to simplify (1.6) and to 
breakdown the terms in terms of past, present and future parts of the sequence, So that 
whenever we are making a decision about a bit +1 or a -1, a cumulative metric will take into 
account the starting and the ending point of the sequence. The starting and ending points of 
a convolutional sequence are known and we use this information to winnow out the likelier 
sequences. 


The term ¥,is the future sequence and we make an assumption that it is independent of the 


past and only depends on the present state s. We can remove these dependencies to 
simplify (1.7). 


P(s',8, Y)=P¥p|# » So Fox We PIS s SY go Wn) 


, (1.8) 
=P(y, |s )P(s, S, Vp y,) 
Now apply Baye’s rule to the last term in (1.8), to get 
PS Se yp We) HP GY: 893) PO) 
P(s,s, y)=P(y,|s) Ps, |5,y,) P(s.y,) (1.9) 


Now define a few new terms. 
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ee PC: y,) 
B,(s) = PCy, |s) (1.10) 
7,(s',5) = P(s, y,|s,y,) 


The terms a, B, y on the right are the metrics we will compute in MAP decoding. Now the 
numerator term of (1.5) becomes. 


P(s,S, Y= A (8) B(5)% (845) (1.11) 


We can plug this into (1.5) to get the LLR equation for MAP algorithm. 


> a, _(8')B,(s)7,(8',8) 


ey 1.12 
‘i »S Oy, 4 (8')B,(8)7_,(8'58) ( ) 


u,=—-1 


a,_,(S') This first term in (1.11) is called the Forward metric. 
f(s) is called the Backward metric. 
y,(s',8) is called Transition metric. 


The MAP algorithm allows us to compute these metrics. Decision is made at each time k, 
about the transmitted bit ux. Unlike Viterbi decoding where decision is made in favor if the 
likeliest sequence by carrying forward the sum of metrics, here the decision is made for the 
likeliest bit. 


The MAP algorithm is also known as Forward-Backward Algorithm because we are 
assessing probabilities in both forward and backward time. 


How to calculate the Forward metrics 


We will start with the forward metric at time k. 


a, (s) = >, P(S,8 ps Ye) 


All states 


Which is also equal to the probabilities at the previous state s’ times the transition 
probability to the current state s. 
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ar, (8) = > P(S,8',Y ps W= DMS 8). a 4(8) (1.13) 


All states All s' 


The initial condition is that since we always start at state 1, 


(s) 1 ifs=l 
a,(s) = 
7 0 otherwise 


Example: 
4.4817 4.4817 
= 1.0 <a et 
@, (1) = 1.0 (9526) sl 
bow 
is 02 
a (2) =0 0288 2 


(.0063) 


a (3) =0 


Figure 10 - Computing Forward metrics 


The left most value at the top is the starting point, so it has a value of a, (1) =1, the starting 


value of o is = 0 at all the remaining three states. The ending forward metric att = kis 


a (s)= D (s,s). 4(8) 


Alls' 
a, (1) = 4.4812x1.0 = 4.4812 
a, (2) = 0.2231 x1.0+ 4.4817 x 0.0 = 0.2231 


Notice that these numbers are larger than 1, which means, they are not probabilities but are 
called Metrics. It also means that since a typical Turbo code frame is thousands of trellis 
sections long that multiplication of these numbers may result in numerical overflow. For 
that reason, both a and B are normalized. 


Backward Metric /,(s) 
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Same as the initial values of @,(0) = 1, we assume that since trellis always ends at state 1, 


all probabilities at this point are equal to 1 or 0. 


By (1) =1 and zero elsewhere. 


This probability is equal to the product of all transition probabilities times the probability at 
the last state working backwards. 


Bas) = > B(s) 1(s',8) (1.14) 


All s 


Compare this with forward metric equation. 


a,{s)= Yale) y(s,s) 


Alls' 


OES a. 


By(l) =1.0 

7 By (2) =0 

By GB) =0 

|g. (4) =0 
Ay) 


Figure 11 - Computing Backward metrics 


Bya(8)= Bs) 10s.) 


Alls 
Bray (1) = 1.0 20.0885 = 20.0885 
Busy (2) =1.0x.0498 = 0.0498 


You can see these values at time k+1 for state 1 and 2. These are then normalized. 
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Numerical issues 


In order to prevent data overflow that can happen in the very large trellis of a Turbo codes,, 
a,(s) and £, _,(s') need to be normalized. For forward metrics, we normalized them by 


their sum. 


a,(s) 


a(S) = Ya, (s) (1.15) 


The reverse metric is similalry normalized by the same quantity as above. It s formulation 
looks different but it is the same thing as the term that normalized the forward metric. 
(Change index to k instead of k-1.) 


B,.,(s') 


a8) = DDH 2(8V%1(8',8) | 


(1.16) 


How to calculate transition probabilities 


Computing transition metrics turns out to be the hardest part. To restate the definition of 
the transition metric from (1.10) 


GAS 5S) SP. 5.2, 


The transition itself does not depend on the past, so we can remove the dependency on the 
past and then apply the Baye’s theorem. 


y.(s',s)=Pls,y,|5, %) 
= P(s,y,|5) (1.17) 
= P(y,|s 8). P6|s) 


7, (s',5) = P(y, | 5,5). Plu,) (1.18) 


There are two terms in this final definition of the transition metric. Let’s take the last one 
first. Here P(ux) is a-priori probability of the input bit. Let’s take its log likelihood. 


Pu, =1) yp PU =) 


L(u,) = log (u, =—1) 1— Ptu, =1) 


Or by taking this expression to power of e, we get 
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elle) — P(u, =1) 
1— Plu, =1) 


From here, some clever algebra gives us 


Plu, =+)) =e” (1- Pu, = +1) 
ge 


~ T4et 


1 


~ 1+e 


-L(u,)/2 
Coe obl)l2 


~ 1+e 


Now we get the general expression for both, +1 and -1. 


eo biu)!2 
= = +L(u, )/2 
Plu, = +1) = jae (1.19) 


The term underlined is a common factor and designated by 


eo Lu y/2 


i Taso Sa 
l+e 


The a-priori probability is now given by 


P(u, = +1) = A, e* (1.20) 


Now for the second term in transition metric expression of (1.18), PCy, | s,s) canbe 


written as 

P(y, 18,8) = Pl, |e) -[]P ou ley) 

For rate ¥%, we can write this simply as, 

PCy, |x,) = °C, + 7EP CE 

This is just a correlation of the input signal values with the trellis values, c. The received 


signal values y; are assumed to be corrupted by an AWGN process. The probability 
P(y,, | ¢,,) is given in a Gaussian channel by 
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1 1 5 
P C,)= ex C 1.21 
(vy | a) ore of oS (Yu — Cy) ( ) 
Apply (1.21) to (1.18), to get 
Ls Ls) inp _ isp) 
% q (ye 7% 
Ply, uy) = EXP} - 5 -¥ 5 
TO 20 i=2 20 
n n 
Expanding this we get, 
= =l 
1,8\2 Ls\2 1, D pi 1,s1,s 1, p 1, p 
Cae es (8 ae ce c gyre 
-( Jexo| =! as Exp| #4, 5 #4 
dno 20 = 2 20 a t=2 a 
n n n n 


The square of the two c (always equal to 1 or -1) values is equal to 1, since square of +1 and 
-1 is the same. This gives the following equation 


2 2 


n i= 


ls ls qd i,p ip 
=B, exe( Se Zk St (1.22) 


Where 


Now we can write the full equation for transition metric by combining (1.22) and (1.19). 
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y,(s',5) =P(y, |5,5). Plu, ). 


yee. ae +1(c,)/2 
= B, exp + > e k (1.23) 
k 2 : k 
or i=2 or 


A, and By, are constants and are not important because when we put this expression in the 
LLR form, they will cancel out. The index q = 2 for rate % code we are using for example. 


Now we define, 


N, E 
gas c = (1.24) 
2 2-coderate-E,/N, 2-E,/N, 
Where p is inverse of code rate, i.e. equal to 2 for code rate = 1/2 
E E 
cae ee, (1.25) 
N, rate: Ny 


E 
where —® is ratio of the un-coded bit energy to noise PSD, E, is coded bit energy, and E.= 
0 
code rate x Ep. 


Pp 


——_—— into (1.23) we have 
2B tN. 


Substitute for 0° = 


Ae Bol No Cy Pee, eye? et 1 
=A, -B,-ex pb 0.) =k A+) kk 4.1 (e))-e 
KP | é [ 5 — 5 (c,) °c; 
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4-E 
-A-B ex [ pl NG be CEO }et L(c,): a) 
P 
[ 1 Ase NV oe t( AOR INS A oe 5 
= Bye] teen] ALM L ac Joop] 3{ Aad MeL ser) 
with £, = 7-72! No _4 pj 
P 
-A B 1 1 1 1 ls ‘sl d i | 
= A, - B, -exp Le C, [exp ¥ Le-— oe (1.26) 
4 i=2 4 


e ' . 1 i i 
y(s',s) = exp] S(te-4-94" 


i=2 


In summary the LLR of the information bit u,; at time k is, 


LH (s',5)B, (s) 


L = 1.27 

es a. YG, s)B,(s) oa 
DGS 8) 1 ifs=l 

ce Dae Ce ACs a ( otherwise ae 
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s')= ; S)= ' 
VY G18 16558) . 0 otherwise 
' 1 eri 1 1 ls 1 i 1 i,p i 
y(s',s)=exp| —-L(c,)-c, + Le-—-y,” +c, |exp Le: —- yy" +e, (1.30) 
2 2 i=2 2 
Substitute (1.30) into (1.27), to get 
io 1 Zp 1 e 1 1 1 ls a] ‘ 1 i,p i | 

> &,,(s') B,(s) exp ou (C,) °C, Fe Ny Cc, |-eXp > Les Ye Cy 
Ya.) B,)-e0 Le tole yy” a, ox) S(t} yer 4) 
= 4 i=2 | 


(1.31) 


If q = 2, the equation become 


F A ee i ; 1 
Y Ga) Ays)-eso| Lr (ch) -C) +5 bey , reno] Le: 3. | 
a 1 1 1 
4,408.) B,(s)- exo] S°(e)-< + ge ye | exp] Le: 2. | 


L(u,) and Lc- y;;° can be pulled out because both numerator and denominator are constant. 


The factor % and c, cancel. We get, 


> a&,.(s')- B,(s)-74(s',8) 
L =| L(c')+Le-y'* |4loo4 : 
(u,) t (c,) + Le: y, |+ 4,6) BO) 7068) 


Where 
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e ' 1 i i 
y(s's) = exp] S(te-d-0¢?-4)] 


This equation can be broken into three distinct numbers. They are 
= L_apriori+ L_channel+ L_ extrinsic (1.32) 


S 


L(c,) is the a-priori information about ux, Lc - yy, is the channel value calculated from the 


knowledge of the SNR and the received signal. The third term is called the a-posteriori term, 
also called the extrinsic L value. 


> G45) 8,6) 76,5) 
L_extrinsic= log = 1.33 
_extrinsic= log va, (s')- 8, (s)-72(s", 5) (1.33) 


During each iteration the decoder produces the extrinsic value using the first two numbers 
as input. The extrinsic value it produces becomes the input to the next decoder. 


L(c,) = Le- yy" + Ly (,) + Ln (ei): 


And eventually a decision is made about the bit in question by looking at the sign of the L 
value. 


ii, = sign{L,(c, )}, 


The process can continue until the extrinsic values stop changing with a preset threshold. 
Or the algorithm can allow just a fixed number of iterations. 


Extrinsic L 
value from 
1to2 


Extrinsic L 
value 
from 2 to 1 


Figure 12 - Iterative nature of Turbo decoding. 
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So that’s about it. I hope it was helpful. The next tutorial goes through the MAP algorithm in 


a step-by-step example. 
Please contact me if you find errors. 


Charan Langton 
www.complextoreal.com 
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